12 research outputs found
Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability
The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities.
Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio
Reinforcement Learning with Intrinsic Affinity for Personalized Asset Management
The common purpose of applying reinforcement learning (RL) to asset
management is the maximization of profit. The extrinsic reward function used to
learn an optimal strategy typically does not take into account any other
preferences or constraints. We have developed a regularization method that
ensures that strategies have global intrinsic affinities, i.e., different
personalities may have preferences for certain assets which may change over
time. We capitalize on these intrinsic policy affinities to make our RL model
inherently interpretable. We demonstrate how RL agents can be trained to
orchestrate such individual policies for particular personality profiles and
still achieve high returns
Reinforcement learning with intrinsic affinity for personalized prosperity management
The purpose of applying reinforcement learning (RL) to portfolio management is commonly the maximization of profit. The extrinsic reward function used to learn an optimal strategy typically does not take into account any other preferences or constraints. We have developed a regularization method that ensures that strategies have global intrinsic affinities, i.e., different personalities may have preferences for certain asset classes which may change over time. We capitalize on these intrinsic policy affinities to make our RL model inherently interpretable. We demonstrate how RL agents can be trained to orchestrate such individual policies for particular personality profiles and still achieve high returns.publishedVersio
Clustering in Recurrent Neural Networks for Micro-Segmentation using Spending Personality
Author's accepted manuscript.© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Customer segmentation has long been a productive field in banking. However, with new approaches to traditional problems come new opportunities. Fine-grained customer segments are notoriously elusive and one method of obtaining them is through feature extraction. It is possible to assign coefficients of standard personality traits to financial transaction classes aggregated over time. However, we have found that the clusters formed are not sufficiently discriminatory for micro-segmentation. In a novel approach, we extract temporal features with continuous values from the hidden states of neural networks predicting customers' spending personality from their financial transactions. We consider both temporal and non-sequential models, using long short-term memory (LSTM) and feed-forward neural networks, respectively. We found that recurrent neural networks produce micro-segments where feed-forward networks produce only coarse segments. Finally, we show that classification using these extracted features performs at least as well as bespoke models on two common metrics, namely loan default rate and customer liquidity index.acceptedVersio
Clustering in Recurrent Neural Networks for Micro-Segmentation using Spending Personality
Author's accepted manuscript.© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Customer segmentation has long been a productive field in banking. However, with new approaches to traditional problems come new opportunities. Fine-grained customer segments are notoriously elusive and one method of obtaining them is through feature extraction. It is possible to assign coefficients of standard personality traits to financial transaction classes aggregated over time. However, we have found that the clusters formed are not sufficiently discriminatory for micro-segmentation. In a novel approach, we extract temporal features with continuous values from the hidden states of neural networks predicting customers' spending personality from their financial transactions. We consider both temporal and non-sequential models, using long short-term memory (LSTM) and feed-forward neural networks, respectively. We found that recurrent neural networks produce micro-segments where feed-forward networks produce only coarse segments. Finally, we show that classification using these extracted features performs at least as well as bespoke models on two common metrics, namely loan default rate and customer liquidity index.acceptedVersio
Reinforcement Learning Your Way : Agent Characterization through Policy Regularization
The increased complexity of state-of-the-art reinforcement learning (RL) algorithms has resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post hoc explainability methods that aim to extract information from learned policies, thus aiding explainability. These methods rely on empirical observations of the policy, and thus aim to generalize a characterization of agents’ behaviour. In this study, we have instead developed a method to imbue agents’ policies with a characteristic behaviour through regularization of their objective functions. Our method guides the agents’ behaviour during learning, which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers’ investment portfolios based on their spending personalities.publishedVersio
Can Interpretable Reinforcement Learning Manage Prosperity Your Way?
Personalisation of products and services is fast becoming the driver of success in banking and commerce. Machine learning holds the promise of gaining a deeper understanding of and tailoring to customers’ needs and preferences. Whereas traditional solutions to financial decision problems frequently rely on model assumptions, reinforcement learning is able to exploit large amounts of data to improve customer modelling and decision-making in complex financial environments with fewer assumptions. Model explainability and interpretability present challenges from a regulatory perspective which demands transparency for acceptance; they also offer the opportunity for improved insight into and understanding of customers. Post-hoc approaches are typically used for explaining pretrained reinforcement learning models. Based on our previous modeling of customer spending behaviour, we adapt our recent reinforcement learning algorithm that intrinsically characterizes desirable behaviours and we transition to the problem of prosperity management. We train inherently interpretable reinforcement learning agents to give investment advice that is aligned with prototype financial personality traits which are combined to make a final recommendation. We observe that the trained agents’ advice adheres to their intended characteristics, they learn the value of compound growth, and, without any explicit reference, the notion of risk as well as improved policy convergence.publishedVersio
Towards Responsible AI for Financial Transactions
Author's accepted manuscript.© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The application of AI in finance is increasingly dependent on the principles of responsible AI. These principles-explainability, fairness, privacy, accountability, transparency and soundness form the basis for trust in future AI systems. In this empirical study, we address the first principle by providing an explanation for a deep neural nenvork that is trained on a mixture of numerical, categorical and textual inputs for financial transaction classification. The explanation is achieved through (1) a feature importance analysis using Shapley additive explanations (SHAP) and (2) a hybrid approach of text clustering and decision tree classifiers. We then test the robustness of the model by exposing it to a targeted evasion attack, leveraging the knowledge we gained about the model through the extracted explanation.acceptedVersio
Diagnostic monitoring of dynamic systems using artificial immune systems
Thesis (MScEng (Process Engineering))--University of Stellenbosch, 2006.The natural immune system is an exceptional pattern recognition system based on
memory and learning that is capable of detecting both known and unknown
pathogens. Artificial immune systems (AIS) employ some of the functionalities of the
natural immune system in detecting change in dynamic process systems. The
emerging field of artificial immune systems has enormous potential in the application
of fault detection systems in process engineering.
This thesis aims to firstly familiarise the reader with the various current methods in
the field of fault detection and identification. Secondly, the notion of artificial immune
systems is to be introduced and explained. Finally, this thesis aims to investigate the
performance of AIS on data gathered from simulated case studies both with and
without noise.
Three different methods of generating detectors are used to monitor various different
processes for anomalous events. These are:
(1) Random Generation of detectors,
(2) Convex Hulls,
(3) The Hypercube Vertex Approach.
It is found that random generation provides a reasonable rate of detection, while
convex hulls fail to achieve the required objectives. The hypercube vertex method
achieved the highest detection rate and lowest false alarm rate in all case studies.
The hypercube vertex method originates from this project and is the recommended
method for use with all real valued systems, with a small number of variables at least.
It is found that, in some cases AIS are capable of perfect classification, where 100%
of anomalous events are identified and no false alarms are generated. Noise has,
expectedly so, some effect on the detection capability on all case studies. The
computational cost of the various methods is compared, which concluded that the
hypercube vertex method had a higher cost than other methods researched. This
increased computational cost is however not exceeding reasonable confines
therefore the hypercube vertex method nonetheless remains the chosen method.
The thesis concludes with considering AIS’s performance in the comparative criteria
for diagnostic methods. It is found that AIS compare well to current methods and that
some of their limitations are indeed solved and their abilities surpassed in certain
cases. Recommendations are made to future study in the field of AIS. Further the
use of the Hypercube Vertex method is highly recommended in real valued scenarios
such as Process Engineering
Reinforcement Learning Your Way: Agent Characterization through Policy Regularization
The increased complexity of state-of-the-art reinforcement learning (RL) algorithms has resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post hoc explainability methods that aim to extract information from learned policies, thus aiding explainability. These methods rely on empirical observations of the policy, and thus aim to generalize a characterization of agents’ behaviour. In this study, we have instead developed a method to imbue agents’ policies with a characteristic behaviour through regularization of their objective functions. Our method guides the agents’ behaviour during learning, which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers’ investment portfolios based on their spending personalities